1 Biomark, Inc. 705 South 8th St., Boise, Idaho, 83702, USA
✉ Correspondence: Kevin E. See <Kevin.See@biomark.com>
A key step in developing a QRF model to predict fish capacities was selecting the habitat covariates to include in the model. Random forest models naturally incorporate interactions between correlated covariates, which is essential since nearly all habitat variables are considered correlated to one degree or another, however, we aimed to avoid overly redundant variables (i.e., variables that measure similar aspects of the habitat). Further, including too many covariates can result in overfitting of the model (e.g., including as many covariates as data points).
To prevent overfitting, we pared down the more than 100 metrics generated by the CHaMP protocol describing the quantity and quality of fish habitat for each survey site. To assist in determining the habitat metrics to include in the QRF model, we used the Maximal Information-Based Nonparametric Exploration (MINE) class of statistics (Reshef et al. 2011) to determine those habitat characteristics (covariates) most highly associated with observed parr densities. We calculated the maximal information coefficient (MIC), using the R package minerva (Filosi et al. 2019), to measure the strength of the linear or non-linear association between two variables (Reshef et al. 2011). The MIC value between each of the measured habitat characteristics and parr density was used to inform decisions on which habitat covariates to include in the QRF parr capacity model.
Habitat metrics were first grouped into broad categories that included channel unit, complexity, cover, disturbance, riparian, size, substrate, temperature, water quality, and woody debris. Habitat metrics measuring volume or area were scaled to the wetted area of each site. Within each category, metrics were ranked according to their MIC value (Table 1 and Figure 1). Our strategy was to select one or two variables with the highest MIC score within each category so that covariates describe different aspects of rearing habitat (e.g., substrate, temperature, etc.). Additionally, we attempted to avoid covariates that were highly correlated (Figure ??) while including covariates that can be directly influenced by restoration actions or have been shown to impact salmonid juvenile density.
Filosi, M., R. Visintainer, and D. Albanese. 2019. Minerva: Maximal information-based nonparametric exploration for variable analysis.
Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. 2011. Detecting novel associations in large data sets. Science 334:1518–1524.
| Category | Name | Abbrv | MIC |
|---|---|---|---|
| ChannelUnit | Channel Unit Frequency | CU_Freq | 0.241 |
| ChannelUnit | Fast Turbulent Frequency | FstTurb_Freq | 0.230 |
| ChannelUnit | Fast NonTurbulent Frequency | FstNT_Freq | 0.209 |
| ChannelUnit | Slow Water Frequency | SlowWater_Freq | 0.208 |
| ChannelUnit | Fast Turbulent Percent | FstTurb_Pct | 0.195 |
| ChannelUnit | ChnlUnitTotal_Ct | ChnlUnitTotal_Ct | 0.189 |
| ChannelUnit | Channel Unit Count | CU_Ct | 0.189 |
| ChannelUnit | Fast Turbulent Count | FstTurb_Ct | 0.178 |
| ChannelUnit | Slow Water Percent | SlowWater_Pct | 0.177 |
| ChannelUnit | Fast NonTurbulent Percent | FstNT_Pct | 0.169 |
| Complexity | Wetted Width To Depth Ratio Avg | WetWDRat_Avg | 0.247 |
| Complexity | Bankfull Width To Depth Ratio Avg | BfWDRat_Avg | 0.245 |
| Complexity | Wetted Depth SD | DpthWet_SD | 0.232 |
| Complexity | Wetted Channel Braidedness | WetBraid | 0.212 |
| Complexity | Bankfull Channel Braidedness | BfBraid | 0.211 |
| Complexity | Wetted Channel Qualifying Island Count | Wet_QIsland_Ct | 0.209 |
| Complexity | Bankfull Width CV | BfWdth_CV | 0.209 |
| Complexity | Bankfull Width To Depth Ratio CV | BfWDRat_CV | 0.202 |
| Complexity | Detrended Elevation SD | DetrendElev_SD | 0.196 |
| Complexity | Bankfull Channel Qualifying Island Count | Bf_QIsland_Ct | 0.193 |
| Cover | Fish Cover: Total | FishCovTotal | 0.225 |
| Cover | Fish Cover: None | FishCovNone | 0.224 |
| Cover | Fish Cover: LW | FishCovLW | 0.213 |
| Cover | Fish Cover: Terrestrial Vegetation | FishCovTVeg | 0.204 |
| Cover | Percent Undercut by Length | UcutLgth_Pct | 0.185 |
| Cover | Percent Undercut by Area | UcutArea_Pct | 0.184 |
| Cover | Fish Cover: Aquatic Vegetation | FishCovAqVeg | 0.166 |
| Cover | Fish Cover: Artificial | FishCovArt | 0.136 |
| Land Classification | Natural Class PCA 2 | NatPrin2 | 0.271 |
| Land Classification | Natural Class PCA 1 | NatPrin1 | 0.258 |
| Land Classification | Disturbance Class PCA 1 | DistPrin1 | 0.242 |
| Riparian | Riparian Cover: Understory | RipCovUstory | 0.206 |
| Riparian | RipCovUstoryNone | RipCovUstoryNone | 0.206 |
| Riparian | Riparian Cover: No Canopy | RipCovCanNone | 0.194 |
| Riparian | Riparian Cover: Some Canopy | RipCovCanSome | 0.194 |
| Riparian | Riparian Cover: Big Tree | RipCovBigTree | 0.184 |
| Riparian | Riparian Cover: Ground | RipCovGrnd | 0.182 |
| Riparian | RipCovGrndNone | RipCovGrndNone | 0.170 |
| Riparian | Riparian Cover: Woody | RipCovWood | 0.168 |
| Riparian | Riparian Cover: Non-Woody | RipCovNonWood | 0.166 |
| Riparian | Riparian Cover: Coniferous | RipCovConif | 0.164 |
| SideChannel | Bankfull Side Channel Width | BfSCWdth | 0.223 |
| SideChannel | Wetted Side Channel Width | WetSCWdth | 0.213 |
| SideChannel | Wetted Side Channel Percent By Area | WetSC_Pct | 0.209 |
| SideChannel | SCSm_Freq | SCSm_Freq | 0.153 |
| SideChannel | SCSm_Ct | SCSm_Ct | 0.153 |
| SideChannel | SC_Area_Pct | SC_Area_Pct | 0.153 |
| Size | Mean Annual Flow | MeanU | 0.346 |
| Size | Wetted Width Integrated | WetWdth_Int | 0.332 |
| Size | Bankfull Width Integrated | BfWdthInt | 0.324 |
| Size | Wetted Width Avg | WetWdth_Avg | 0.324 |
| Size | Drainage Area (Flowline) | CUMDRAINAG | 0.302 |
| Size | Bankfull Width Avg | BfWdth_Avg | 0.298 |
| Size | DpthThlwg_Avg | DpthThlwg_Avg | 0.280 |
| Size | Discharge | Q | 0.259 |
| Size | Bankfull Depth Avg | DpthBf_Avg | 0.245 |
| Size | Bankfull Depth Max | DpthBf_Max | 0.240 |
| Substrate | Substrate < 6mm | SubLT6 | 0.237 |
| Substrate | Substrate < 2mm | SubLT2 | 0.227 |
| Substrate | Substrate: D16 | SubD16 | 0.219 |
| Substrate | Substrate: Embeddedness Avg | SubEmbed_Avg | 0.204 |
| Substrate | Substrate: D50 | SubD50 | 0.197 |
| Substrate | Substrate Est: Sand and Fines | SubEstSandFines | 0.190 |
| Substrate | Substrate Est: Cobbles | SubEstCbl | 0.185 |
| Substrate | Substrate: D84 | SubD84 | 0.185 |
| Substrate | Substrate Est: Boulders | SubEstBldr | 0.183 |
| Substrate | Substrate: Embeddedness SD | SubEmbed_SD | 0.181 |
| Temperature | Avg. August Temperature | avg_aug_temp | 0.272 |
| Temperature | Elev_M | Elev_M | 0.262 |
| Temperature | August Temperature | aug_temp | 0.188 |
| Temperature | Solar Access: Summer Avg | SolarSummr_Avg | 0.186 |
| WaterQuality | Conductivity | Cond | 0.254 |
| WaterQuality | Alkalinity | Alk | 0.225 |
| WaterQuality | Drift Biomass | DriftBioMass | 0.000 |
| Wood | Large Wood Volume: Bankfull Slow Water | LWVol_BfSlow | 0.213 |
| Wood | Large Wood Volume: Wetted Slow Water | LWVol_WetSlow | 0.207 |
| Wood | Large Wood Frequency: Wetted | LWFreq_Wet | 0.199 |
| Wood | Large Wood Volume: Bankfull | LWVol_Bf | 0.189 |
| Wood | Large Wood Volume: Wetted Fast Turbulent | LWVol_WetFstTurb | 0.187 |
| Wood | Large Wood Frequency: Bankfull | LWFreq_Bf | 0.178 |
| Wood | Large Wood Volume: Bankfull Fast NonTurbulent | LWVol_BfFstNT | 0.175 |
| Wood | Large Wood Volume: Wetted | LWVol_Wet | 0.166 |
| Wood | Large Wood Volume: Wetted Fast NonTurbulent | LWVol_WetFstNT | 0.159 |
Figure 1: Barplots of MIC statistics, faceted by habitat category.
Figure 2: Barplot of MIC statistics, colored by habitat category.